2 research outputs found

    New approaches to data access in large-scale distributed system

    Get PDF
    Mención Internacional en el título de doctorA great number of scientific projects need supercomputing resources, such as, for example, those carried out in physics, astrophysics, chemistry, pharmacology, etc. Most of them generate, as well, a great amount of data; for example, a some minutes long experiment in a particle accelerator generates several terabytes of data. In the last years, high-performance computing environments have evolved towards large-scale distributed systems such as Grids, Clouds, and Volunteer Computing environments. Managing a great volume of data in these environments means an added huge problem since the data have to travel from one site to another through the internet. In this work a novel generic I/O architecture for large-scale distributed systems used for high-performance and high-throughput computing will be proposed. This solution is based on applying parallel I/O techniques to remote data access. Novel replication and data search schemes will also be proposed; schemes that, combined with the above techniques, will allow to improve the performance of those applications that execute in these environments. In addition, it will be proposed to develop simulation tools that allow to test these and other ideas without needing to use real platforms due to their technical and logistic limitations. An initial prototype of this solution has been evaluated and the results show a noteworthy improvement regarding to data access compared to existing solutions.Un gran número de proyectos científicos necesitan recursos de supercomputación como, por ejemplo, los llevados a cabo en física, astrofísica, química, farmacología, etc. Muchos de ellos generan, además, una gran cantidad de datos; por ejemplo, un experimento de unos minutos de duración en un acelerador de partículas genera varios terabytes de datos. Los entornos de computación de altas prestaciones han evolucionado en los últimos años hacia sistemas distribuidos a gran escala tales como Grids, Clouds y entornos de computación voluntaria. En estos entornos gestionar un gran volumen de datos supone un problema añadido de importantes dimensiones ya que los datos tienen que viajar de un sitio a otro a través de internet. En este trabajo se propondrá una nueva arquitectura de E/S genérica para sistemas distribuidos a gran escala usados para cómputo de altas prestaciones y de alta productividad. Esta solución se basa en la aplicación de técnicas de E/S paralela al acceso remoto a los datos. Así mismo, se estudiarán y propondrán nuevos esquemas de replicación y búsqueda de datos que, en combinación con las técnicas anteriores, permitan mejorar las prestaciones de aquellas aplicaciones que ejecuten en este tipo de entornos. También se propone desarrollar herramientas de simulación que permitan probar estas y otras ideas sin necesidad de recurrir a una plataforma real debido a las limitaciones técnicas y logísticas que ello supone. Se ha evaluado un prototipo inicial de esta solución y los resultados muestran una mejora significativa en el acceso a los datos sobre las soluciones existentes.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: David Expósito Singh.- Secretario: María de los Santos Pérez Hernández.- Vocal: Juan Manuel Tirado Mart

    SkyCDS: A resilient content delivery service based on diversified cloud storage

    Get PDF
    Cloud-based storage is a popular outsourcing solution for organizations to deliver contents to end-users. However, there is a need for contingency plans to ensure service provision when the provider either suffers outages or is going out of business. This paper presents SkyCDS: a resilient content delivery service based on a publish/subscribe overlay over diversified cloud storage. SkyCDS splits the content delivery into metadata and content storage flow layers. The metadata flow layer is based on publish-subscribe patterns for insourcing the metadata control back to content owner. The storage layer is based on dispersal information over multiple cloud locations with which organizations outsource content storage in a controlled manner. In SkyCDS, the content dispersion is performed on the publisher side and the content retrieving process on the end-user side (the subscriber), which reduces the load on the organization side only to metadata management. SkyCDS also lowers the overhead of the content dispersion and retrieving processes by taking advantage of multi-core technology. A new allocation strategy based on cloud storage diversification and failure masking mechanisms minimize side effects of temporary, permanent cloud-based service outages and vendor lock-in. We developed a SkyCDS prototype that was evaluated by using synthetic workloads and a study case with real traces. Publish/subscribe queuing patterns were evaluated by using a simulation tool based on characterized metrics taken from experimental evaluation. The evaluation revealed the feasibility of SkyCDS in terms of performance, reliability and storage space profitability. It also shows a novel way to compare the storage/delivery options through risk assessment. (C) 2015 Elsevier B.V. All rights reserved.The work presented in this paper has been partially supported by EU under the COST programme Action IC1305, Network for Sustainable Ultrascale Computing (NESUS)
    corecore